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Abstract 

Recent advances in signal processing have focused on the use of sparse representations in various 
applications. A new field of interest based on sparsity has recently emerged : compressed sensing. 
This theory is a new sampling framework that provides an alternative to the well-known Shannon 
sampling theory. In this paper we investigate how compressed sensing (CS) can provide new insights 
into astronomical data compression and more generally how it paves the way for new conceptions in 
astronomical remote sensing. We first give a brief overview of the compressed sensing theory which 
provides very simple coding process with low computational cost, thus favoring its use for real-time 
applications often found on board space mission. We introduce a practical and effective recovery algorithm 
for decoding compressed data. In astronomy, physical prior information is often crucial for devising 
effective signal processing methods. We particularly point out that a CS-based compression scheme is 
flexible enough to account for such information. In this context, compressed sensing is a new framework 
in which data acquisition and data processing are merged. We show also that CS provides a new fantastic 
way to handle multiple observations of the same field view, allowing us to recover information at very 
low signal-to-noise ratio, which is impossible with standard compression methods. This CS data fusion 
concept could lead to an elegant and effective way to solve the problem ESA is faced with, for the 
transmission to the earth of the data collected by PACS, one of the instruments on board the Herschel 
spacecraft which will launched in 2008. 
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Introduction 

From year to year, the quantity of astronomical data increases at an ever growing rate. In part this is 
due to very large digital sky surveys in the optical and near infrared, which in turn has been made possible 
by the development of digital imaging arrays such as CCDs (charge-coupled devices). The size of digital 
arrays is continually growing, pushed by the demands of astronomical research for ever larger quantities 
of data in ever shorter time periods. As a result, the astronomical community is also confronted with 
a rather desperate need for data compression techniques. Several techniques have in fact been used, or 
even developed, for astronomical data compression. Veran [1] studied lossless techniques. White et al. [2] 
developed HCOMPRESS, based on the Haar wavelet transform, and Press et al. [3] developed FITSPRESS 
based on the Daubechies wavelet transform. In addition, the scientist must of course consider JPEG, a 
general purpose standard. Effective and efficient compression based on the multiresolution Pyramidal 
Median Transform (PMT) algorithm was developed by Starck et al. [4]. Huang and Bijaoui [5] used 
mathematical morphology in MathMorph for astronomical image processing. 

For some projects, we need to achieve huge compression ratios, which cannot be obtained by current 
methods without introducing unacceptable distortions. For instance, it was shown [6] that if we wish to 
extend the GAIA mission in order to make a high-spatial resolution all-sky survey in the visible based on 
a scanning satellite, then the main limitation is the amount of collected data to be transmitted. A solution 
could be to introduce all our knowledge of both the sky and the instrument in order to compress only 
the difference between what we know and what we observe [6]. However, errors on the point spread 
functions, positions of stars, etc., must be under control [6] and the computation cost on board of the 
satellite may be unacceptable. The Herschel satellite!^, which will be launched in 2008, is faced with a 
similar problem. Indeed the photometer data need to be compressed by a factor of 16 to be transferred. 
The yet implemented lossless compression scheme (based on entropy coding) yield a compression rate of 
2.5. ES^y is in need of a compression ratio of 6. As the CPU load has to be extremely small, conventional 
compression methods cannot be used. 

Recently, an alternative sampling theory has emerged which shows that signals can be recovered 

from far fewer samples (measurements) than what the Nyquist/Shannon sampling theory states. This 

new theory coined compressed sensing or (compressive sensing) (CS) introduced in the seminal papers 

*See http ://www.esa.int/science/herschel 
^See http ://www.esa.int. 



3 



[7], [8], [9] relies on the compressibility of signals or more precisely on the property for some signals 
to be sparsely represented. In a more general setting, sparsity is known to entail effective estimation 
(restoration, blind source separation • • • etc.), efficient compression or dimension reduction. From the 
compressed sensing viewpoint, sparse signals could be acquired "economically" (from a few samples) 
without loss of information. It introduces new conceptions in data acquisition and sampling. It has been 
shown that CS could be useful in many domains such as medical imaging [10], biosensing [11], radar 
imaging [12] or geophysical data analysis [13]. 

Scope of the paper : We propose a new alternative approach for the transmission of astronomical 
images, based on CS. Similarly to classical compression schemes, CS can be arranged as a "Coding- 
Decoding" two-stage scheme. In practical situations (more particularly for on board applications), CS 
provides a particularly simple coding stage that only requires a low computational cost. Most of the 
computational complexity is then carried by the decoding step. In this context, we introduce a new 
decoding algorithm that quickly and accurately provides close solutions to the decoding problem. Section U 
reviews the principle of the CS theory. Section Ull shows how CS can be used in astronomy and presents 
a decoding algorithm. More generally, we introduce a new conception of astronomical remote sensing ; 
we particularly show that the CS framework is able to account for specific physical priors. It paves the 
way for new instrument design in which data acquisition, compression and processing can be merged. 
In section III-CI we show how CS offers us a new data fusion framework when multiple observations of 
the same field of view are available. This happens very often in astronomical imaging when we need to 
build a large map from a micro-scan or a raster-scan strategy. Section |lll] emphasizes on the effectiveness 
of the proposed CS-based compression for solving the Herschel data compression problem. Indeed, we 
show the advantage of CS over the averaging approach which has been considered so far. 

I. An overview of compressed sensing theory 

In this section, we give a brief and non exhaustive review of compressed sensing and show how this 
new sampling theory will probably lead to a "revolution" in signal processing and communication theory. 
For more exhaustive tutorials in this field, we refer the reader to the review papers [14], [15]. Assume 
j; G R* (written as a column vector with t entries) such that we "observe" or "measure" only M < t 
samples {yk}k=i,--- ,m- These measures are obtained by projecting the signal x on a set of so-called 
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measurement vectors {9k}k=i, - ,m as follows : 



Vk = (x,6k 



(1) 



The backbone of compressed sensing relies on two major concepts : i) the data to compress are indeed 
compressible; more precisely the data x have a "structured" content so that they can be sparsely 
represented in some basis ii) the measurement vectors {Ok}k=i,- ,m are non adaptive (they should 
not depend on x) and incoherent with the basis in which x is assumed to be sparse. 

A. The gist of compressed sensing 

Compressibility: Most "natural" images or signals have highly structured contents (i.e. contours and 
textures in image processing). Recent advances in harmonic analysis have provided tools that efficiently 
represent such structures (wavelets, ridgelets [16], curvelets [17], [18], contourlets [19], to name a few). 
In this context, efficient representations mean sparse representations. Let's consider a signal x of size t. 
Assume that x can be represented from T >t signal waveforms {4>i}i=i,... ,t '• 



This relation can be more conveniently recast in matrix formulation : x = ^a. The signal x is said 
to be sparse in $ if most entries of the so-called coefficient vector a are zero or close to zero and 
thus only a few have significant amphtudes. Such signal x can be efficiently approximated (with low £2 
approximation error) from only a few significant coefficients. In the extreme case, x is K-sparse : x can 
be exactly synthesized from K <^t coefficients. Then such sparse signal is highly compressible as the 
knowledge of only K parameters is needed to perfectly reconstruct the signal x. 

Note that, in the last decade, sparsity has emerged as one of the leading concepts in a wide range of signal 
processing applications (restoration [20], feature extraction [21], source separation [22], compression 
([23], [24]), to name only a few). 

Recently, a wide range of theoretical and practical studies have focused on sparse decomposition problems 
in overcomplete (the case T > t) signal waveform dictionaries (see [25] and references therein). In this 
paper we will mainly focus on sparsity assumptions in orthonormal bases Extensions to overcomplete 
dictionary would be straightforward in the hght of the aforementioned references. 
From now we assume that x have a K-sparse decomposition in the orthobasis The data x are then 



T 




(2) 



i=l 
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compressible ; the next problem then amounts to accounting for signal compressibility to devise efficient 
non-adaptive signal compression. 

Incoherence of the measurements: As an intensive field of research, several works have already 
addressed compressed sensing in various settings (see [26], [7], [27] and references therein). In the 
aforementioned references, the way the measurements are designed plays a crucial role. Let us assume 
that the signal x G M* is a highly compressible K-sparse signal in the orthobasis In compressed 
sensing, measurements are simple linear projections {yk}k=i, -- ,M '■ Uk = (^x,(^k^- Historical works 
considered measurements from random ensembles (see [26], [7], [8], [27] and references therein). In 
these seminal papers, randomness is likely to provide incoherent projections. Recall that the coherence 
between two matrices is measured by their mutual coherence (see [28], [15]) : 



/i© ^ = max 



(3) 



In practical situations, measurement vectors are designed by selecting at random a set (indexed by A) of 
vectors from a deterministic ensemble as suggested in [29], [15] : y = ©ax. 

B. Signal recovery 

a ) Exact solutions: The previous paragraph emphasized on the way the coding/sensing step should 
be devised. The decoding step amounts to recover the original signal x out of the compressed signal 
y = ©AX. Furthermore, x is known a priori to be ii'-sparse in $ : x = $a where a is a sparse vector 
of size t. Then the recovery problem boils down to the following sparse decomposition issue in the 
overcomplete system ©a^ : 

mill ||a||£„ s.t. y = ©A^ct (4) 

a 

In the last decade, sparse decomposition issues have been a very active field. Strong recovery results 
have been provided (see [28], [25], [30]). Classically, the £o norm is substituted with the convex £i-norm 
to avoid the combinatorial nature of the problem in Equation The recovery problem is then recast in 
a convex optimization program : 

min \\a\\i s.t. y = ©A^a (5) 

a 

Equivalence between these problems has led to a considerable literature (see [25] and references therein). 
At first sight, the decoding step in compressed sensing is equivalent to a sparse decomposition problem 
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in an overcomplete system ^. Formally, the specificity of CS relies on the particular structure of the 
overcomplete representation at hand : ^' = 0a$. Several strong recovery results in the particular CS 
framework have been proved based on specific assumptions with random measurement ensembles (see 
[31], [26], [7], [32]). In practice, as stated earlier, measurements are more conveniently devised from 
random subsets of deterministic ensembles. 

b) Approximate solutions: In practice, signals are seldom iiT-sparse. Furthermore, the data are often 
corrupted by noise. A more realistic compression model would be the following : 

y = @K{x + n) (6) 

where n is a white Gaussian noise with variance u^. As the measurement matrix ©a is a sub-matrix 
of the orthonormal matrix 0, the projected noise n\ = ©An is still white and Gaussian with the same 
variance o"^. The projected data are then recast as follows : y = @kx + nx- The recovery step then boils 
down to solving the next optimization problem : 

min||a||£j s.t. ||y — 0A^a|L <e (7) 

where e is an upper bound of HnH^^. Defining e = \/t + 2\/2tan provides a reasonable upper bound on 
the noise £2 norm, with overwhelming probability. This problem is known as the LASSO in statistics [33] 
or Basis Pursuit denoising [34]. In the noiseless case (e = 0), it has been shown in [35] that the solution 
to the problem in Equation ^ leads to an approximation error close to the optimal sparse approximation. 
The optimal sparse approximations would be obtained by reconstructing x from its K most significant 
coefficients in $ (if they were known !). In the noiseless case, the solution to the problem in Equation (|7]) 
is also shown to provide stable solutions. 

The convex program (second-order cone program) in Equation (|7]) then provides an efficient and robust 
mechanism to provide an approximate to the signal x. A wide range of optimization techniques (see 
[36], [37], [38] to quote a few) providing fast algorithms have been devised to solve the problem in 
Equation 

II. COMPRESSED SENSING IN ASTRONOMY 

In the next sections, we focus on applying the compressed sensing framework to astronomical remote 
sensing. In Section III-Al we show that compressed sensing and more precisely its way of coding 
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information provides alternatives to astronomical instrument design. Section III-BI gives emphasis on 
the ability of CS decoding to easily account for physical priors thus improving the whole compression 
performances. 

A. A new way of coding signals 

In the compressed sensing framework, the coding step needs a very low computational cost. Compressed 
sensing is then very attractive in several situations : i) narrow transmission band (for remote sensing) 
or/and ii) compressing large amount of data ; for instance in fast scanning or wide field sensing. Indeed, 
in the compressed sensing framework, the way of coding information can impacts on instrumentation in 
two ways as detailed hereafter. 

1) Measuring physics : : The philosophy of compressed sensing (i.e. projecting onto incoherent mea- 
surement ensembles) should be directly applied on the design of the detector. Devising an optical system 
that directly "measures" incoherent projections of the input image would provide a compression system 
that encodes in the analog domain. Compression would be made by the sensor itself ! 
Interestingly, such kind of measurement paradigm is far from being science-fiction. Indeed, in the field of 
7-ray imaging, the so-called coded-maskj^ (see [39] and references therein) are used since the sixties and 
are currently operating in the ESA/Integral space missioifl In 7-ray (high energy) imaging, coded masks 
are used as aperture masks scattering the incoming 7 photons. More formally, the couple (coded aperture 
mask and detector field) is equivalent to selecting some projections in the Fourier space. In coded aperture 
imaging, the way the mask is designed is likely to simulate incoherent projections. Furthermore, 7-ray 
data are often made of point sources that are almost sparse in the pixel domain. Fourier measurements 
then provide near optimal incoherent projections. The first application of compressed sensing then dates 
back to the sixties ! In the compressed sensing community, the coded mask concept has inspired the 
design of the celebrated "compressed sensing camera" [40] that provide effective image compression 
with a single pixel. 

In coded aperture imaging, the decoding step is often performed by iterative techniques based on 

maximum entropy [41]. Applying a sparsity-based recovery technique as advocated by the compressed 

sensing theory would probably provide enhancements. 

^We invite the interested readers to visit the following site that is devoted to coded aperture imaging : http .-//astrophy- 
sics, gsfc. nasa. gov/cai/. 

''See http :// sci.esa.int/science-e/www/area/index.cfm ? fareaid=21. 



2) Coding information : : The second way of applying compressed sensing for astronomical remote 
sensing is more conventional. As illustrated in Figure [H the coding stage mainly computes a few projec- 
tions of the signal x. For the sake of economy, computing these projections should be computationally 
cheap. As stated in Section II-AI good measurements vectors must be incoherent with the basis $ in 
which X is assumed to be sparse. Fortunately, most astronomical data are sparsely represented in a wide 
range of wavelet bases. In that context, as emphasized by Candes in [15], noiselets (see [42]) provide a 
near optimal measurement ensemble for astronomical data. The attractiveness of noiselets is twofold : 

- Low computational cost : on board compression can afford noiselet measurements as a fast transform 
(requiring O (t) flops) is available. 

- Non-adaptive coding : noiselets projections provide near-optimal measurements with most astrono- 
mical data that are sparsely represented in wavelet bases. 

The coding process is non-adaptive : the measurement ensemble may depend on the sparse represen- 
tation $ but not directly on the data x. In this context, the measurement ensemble is efficient for a 
wide class of signals (sparse in the orthobasis $). 



Noiselet 
aiisform 
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y 



Fig. 1. The coding scheme. 



B. Practical signal recovery 

In contrast to the simplicity of the coding step, the decoding step requires a more complex decom- 
pression scheme. As emphasized in Section IH the decoding step is equivalent to solving the inverse 
problem in Equation (|7]). Practical situations involving large scale problems require the use of a fast and 
accurate decoding algorithm. In this Section, we introduce a new fast algorithm for solving the recovery 
problem in Equation ([7]). We particularly focus on the flexibility of the decoding step. Indeed, in the 
compressed sensing framework, the decompression step can account for physical priors thus entailing 
higher performances. 
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Physical priors 

Fig. 2. The decoding scheme. 



1) A practical and effective CS decoding algorithm: The decoding or recovery step amounts to solving 
the following convex program : 

min ||a||£j s.t. ||y — 0A^a|L <e (8) 

The measurement matrix is composed of a subset A indexing M = Card (A) row vectors of the 
orthonormal matrix 0. Let define Ia as the diagonal matrix the entries of which are defined as follows : 

{1 if i e A 
(9) 
otherwise 

where lA[i,«] is the i-th diagonal element of Ia- Let define the signal of size t as follows : 

y{ = y and = (10) 
where A*^ is complement of A in {1, • • • , t}. The problem in Equation ([8]) is then recast as follows : 



mill a L, s.t. 



— IaQ^o; 



<e (11) 



With an appropriate bijective re-parametrization, there exists a constant 7 such that the problem in 
Equation (ITTI ) can be formulated as an augmented Lagrangian : 



a = Arg min — 

a 2 



+ l\\a\\e, (12) 



A wide range of optimization techniques, often based on iterative thresholding, have been proposed to 
solve this problem ( [43], [44] to quote a few). Recently, a general framework [45] for solving such 
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problems has been introduced based on proximal projections. In the light of the proximal forward- 
backward optimization techniques developed in [45], solving the problem in Equation ([T2l ) can be done 
by means of projected Landweber iterative algorithm. At iteration (h), the coefficients a would be updated 
as follows : 



where is the soft-thresholding operator with threshold 7. R is a relaxation descent-direction matrix 
such that the spectral radius of I — MIa©^ is bounded above by 1. Choosing R = $^0^Ia entails 
appreciable simplifications : 



Convergence conditions are given in [45]. 

a) Choosing the regularization parameter 7.- The choice of the regularization parameter is crucial 
as it balances between the sparsity constraint and the how the solution fits the data. Classical approaches 
would advocate the use of cross-validation to estimate an optimal value of 7. Nevertheless, cross-validation 
is computationally expensive and thus not appropriate for large scale problems. 

From a different point of view, solving the initial problem in Equation ([8]l can be done, under mild 
conditions, by homotopy continuation techniques (see [46], [47], [48] and references therein). Such 
techniques iteratively selects coefficients a by managing active sets of coefficients. This kind of process 
has the flavor of iterative hard-thresholding with decreasing threshold 7. Inspired by such techniques, 
the threshold 7 is decreased at each iteration. It starts from 7^^^ = H^^O^yfUoo and decreases towards 
7min- The value of 7min is in the noiseless case. When noise corrupts the data y", 7inin may depend on 
the noise level. In Section JIIJ numerical results are given. In these experiments, noise contamination is 
assumed to be white Gaussian with zero mean and variance o"^. In this case, the final threshold is chosen 
as 7min = 3(T„ which gives an upper bound for noise coefficients with overwhelming probability. 
In practice, substituting the soft-thresholding operator in Equation ([T4b by the hard thresholding operator 
provides better recovery performances. In the forthcoming experiments, we use hard-thresholding rather 
than soft-thresholding. 

b) The ProxIT algorithm: The next panel introduces the ProxIT algorithm. 




(13) 




(14) 
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1 . Set the number of iterations /max and threshold 7'°' = ||*'^0'^y''||oo- a;''^' is set to zero. 

2. While 7^'"' is higher than a given lower bound 7min 

• Compute the measurement projection of s*''"^' : 

• Estimate the current coefficients q*''^ : 

• Get the new estimate of x by reconstructing from the selected coefficients a''"' : 

3. Decrease the threshold 7^''^ following a given strategy. 



c) Remark: Hereafter we enlighten some links between the ProxIT algorithm and previous work. 
When, the measurement ensemble is the canonical basis of M* (G = I), the problem in Equation (fTTI) 
can be equivalently rewritten as follows : 



mm \\a\\i^ s.t. 
a 



- Ma X 

where TWa is a binary mask of size t such that : 



< e where x = (15) 



1 if i G A 

Vie {!,••• MA[i] = { (16) 

otherwise 

This very special case of compressed sensing if equivalent to an interpolation known as inpainting (filling 
holes in x). Interestingly, the ProxIT algorithm has then the flavor of the MCA inpainting algorithm 
introduced in [49]. From that viewpoint, the ProxIT generalizes the former algorithm to a wider range 
of measurement ensembles. 



d) Recovery results: In this Section we provide several recovery results obtained using the ProxIT 
algorithm. In this experiment, the original data x is a 512 x 512 HSllfl image. Like most astronomical 
data, this signal is well {i.e. sparsely) represented in a wavelet basis. Indeed, this kind of data mostly 
contains pointwise singularities (for instance stars or point sources) with smooth diffuse background. As 
stated earlier, choosing an effective measurement ensemble boils down to finding an orthobasis that is 
incoherent with the sparse representation $ (hereafter wavelets). Noiselets (see [42]) are an orthogonal 
basis that is shown to be highly incoherent with a wide range of practical sparse representations (wavelets, 

^See http :// hubblesite.org/. 
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Fourier to quote a few - see [15]). In the following experiment, the data x are projected on a random subset 
of noiselet projections. More precisely, y have been computed by randomly selecting coefficients of ©'^x. 
In the ProxIT algorithm, the sparse representation $ is an undecimated wavelet transform. The left picture 
of Figure |3] shows the original signal x. The picture in the middle features the signal x recovered using 
the ProxIT algorithm from 0.2 * t random noiselet projections. Pictures in Figure |4] depict the zoomed 
version of these images. Visually, the ProxIT algorithm performs well as it provides solutions close to 
the original data x. Both the pointwise structures and more diffuse features (such as the gravitational arc 
visible in Figure are effectively restored. The ProxIT algorithm has been performed on compressed 
signals with varying relative number of noiselet projections (compression rate) p = Card (A) /t. Figure [5] 
features the SNR of the recovery results when p varies from 0.05 to 0.9. The ProxIT algorithm provides 
reasonable solutions for compression rate higher than p = 0.1. This experiment has been performed to 
enlighten the efficiency of the ProxIT algorithm for compressed sensing recovery issues. Performance 
analysis in the framework of the Herschel project are presented in Section Hill 




Fig. 3. Left : Input image of size 512 x 512. Middle : Reconstruction from noiselet-based projections involving 20% of the 
available projections. The ProxIT algorithm has been used with Pmax = 100. Right : Difference between the original image 
and its CS-based reconstruction. 



Comparison with other methods: 

- Linear programming : in the seminal paper [34], the authors proposed to solve the convex -sparse 
decomposition problem in Equation ([8]l with linear programming methods such as interior point 
methods. Several techniques based on linear programming have been developed (see [37], [50] to 
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Fig. 4. Left : Zoom of the input image of size 512 x 512. Middle : Zoom of the reconstruction from noiselet-based projections 
involving 20% of the available projections (Card(A)/f = 0.2). The ProxIT algorithm have been used with Pmax = 100. Right : 
Zoom of the difference between the original image and its CS-based reconstruction. 




'O 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

Relative number of noiselet projections 



Fig. 5. Recovery Signal-to-noise ratio when the relative number of noiselet projections varies. 



name a few) .Unfortunately, linear programming-based methods are computationally demanding and 
thus not well suited to large-scale problems such as ours. 
- Greedy algorithms : the most popular greedy algorithm must be the Matching Pursuit and its or- 
thogonal version OMP [51]. Conditions have been given under which MP and OMP are proved to 
solve the £i and £o sparse decomposition problems [52], [30], [53]. Greedy algorithms have also 
been proposed by the statistics community for solving variable selection problems (LARS/LASSO 
see [47], [33]). Homotopy-continuation algorithms have also been introduced to solve the sparse 
decomposition problem [46], [54], [55]. Interestingly, a recent work by Donoho [56] sheds light 
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on the links between greedy algorithms such as OMP, variable selection algorithms and homotopy. 
Such greedy algorithms however suffer from high computational cost. 
- Iterative thresholding : recently, iterative thresholding algorithms have been proposed to mitigate the 
greediness of the aforementioned stepwise algorithms. Iterative thresholding has first been introduced 
for solving sparsity-based inverse problems (see [57], [43], [58], [45]). Some techniques based on 
iterative thresholding have been devised for CS (see [59], [38], [36] and references therein). The 
attractiveness of the proposed ProxIT algorithm is its simplicity : i) it is a fast algorithm as computing 
0, $ {resp. 0^, ^^) is performed by using implicit fast synthesis {resp. analysis) transforms ; ii) 
the ProxIT algorithm can easily account for further constraints such as positivity. 

Accounting for physical priors: In this section, we assume that the data x have been compressed using 
compressed sensing. The "observed" data y are then made of M incoherent projections : y = @{^x. 
In the compressed sensing framework, the conventional decompression scheme would require solving 
the problem in Equation ([8]). In real-world applications, further a priori knowledge provides useful 
information to describe the data x. For instance, in astronomical applications, the data x are often photon 
intensity. Positiveness is then a simple physical prior assumption to account for in the decoding step. 
More generally, let assume that the useful data x are observed through an "observation" map ; the 
compressed data y are then recast as follows : 

y = 0A^(x) + n (17) 

where n models projected instrumental noise or model imperfections. The "observation" map can model 
a wide range of physical or instrumental priors : physical generating model, instrumental perturbations 
(convolution, instrumental detector response,- • • etc.) to quote a few. In Section|lIll the "observation" map 
involves image shifts. In this context, accounting for such priors in the decoding step is desirable. The 
problem in Equation ([8]l is then rewritten as follows : 

min||a||£^ s.t. \\y - {^a)\\, < e (18) 

The ProxIT algorithm can be adapted to solve this problem. In case F is linear {i.e. (x) = Fx where 
F is a t X t matrix - for instance, F may model a convolution operator), extending the ProxIT algorithm 
to solve the problem in Equation (fTSl) is straightforward. In case is non linear, the problem at hand gets 



15 

far more difficult and will clearly depend on the expression of T . Note that iterative thresholding-based 
techniques involving special instances of non-linear models have been studied in [60]. In the next section, 
we will consider the case of bijective possibly non-linear maps T . 

To conclude this section, compressed sensing provides an attractive compression scheme : i) the coding 
step is simple with a very low computational cost, ii) the decoding step is able to account for physical 
priors. Compressed sensing then fills the gap between data acquisition and data processing. 



C. Compressed sensing versus Standard compression techniques 

CS-based cornpression have several advantages over standard compression techniques such as the 
celebrated JPEGij compression standard. 

1) Computational complexity: In case compressed sensing is used as a "conventional" compression 
technique, the CS projections (noiselets in the forthcoming examples), require no further encoding in 
contrast to classical compression methods such as JPEG or JPEG2000. Furthermore, the only computa- 
tional cost required by a CS-based compression is the computation of these projections. In case noiselets 
are used, their computational cost evolves as 0{t) thus involving a low CPU load which is lower than 
the computational burden required by JPEG {0{t\og{t))). It can be even much faster if these projections 
are made with an optical system. 

2 ) Decoupling: In contrast to classical compression techniques, there is a complete decouphng between 
the compression and the decompression in the CS framework. Therefore the decompression step can be 
changed while keeping the same compressed data. This could is a very nice property. Indeed, we have 
seen that the quality of the decompressed data is related to the sparsity of the data in a given basis 

If we discover in a few years a new dictionary which leads to a better sparsity of the data, then we can 
still improve the quality of the decompressed data. 

3 ) Data Fusion: In astronomy, remote sensing data involving specific scanning strategies (raster scans) 
often provide redundant information which cannot be accounted for by standard compression techniques. 
For instance, consider that the data are made of 10 images {xi}^^^ ... such that each image Xi is the 
noisy version of the original datum x* : Xi = x* + Ui where rij is a white Gaussian noise with variance 
(7^ = 1 and \/i ^ j; Klmrij} = 0. We assume that the original datum is a faint point source as 
depicted at the top on the left of Figure [6l The SNR of each image Xi is — 26dB. The picture at the 

*See http ://www.jpeg.org/. 
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top on the right of Figure |6] depicts the first observed datum xi. Each image {xi}-^^ ... is compressed 
using JPEG and CS with a compression ratio p = 0.25. The picture at the bottom on the left of Figure |6] 
is the estimate of x* which has been computed has the average of the 10 compressed JPEG data. The 
picture at the bottom on the left in Figure |6] is the CS -based estimate of x* which has been provided by 
using the ProxIT algorithm to solve the following decoding problem : 

10 

mill ||a*||^^ s.t. Wvi - 0A*a''||^^ < e (19) 

i=l 

where x* = ^a* and yi = &AXi. The measurement ensemble is made of noiselets. $ is an isotropic 
undecimated wavelet frame. Clearly, the JPEG compression leads to a catastrophic compression as the 
faint point source is not detectable after compression, while the CS -based compression technique is able 
to retrieve the faint point source as illustrated in Figure |6l 

This huge difference for data fusion problems between both compression strategies is the consequence 
of a fundamental property of CS : the linearity of the compression. In contrast to standard compression 
techniques (such as JPEG), the CS-based compression is linear. The data to transmit are indeed simple 
linear projections : y = 0a(x* + n) where n models instrumental noise. Whatever the compression 
rate (i.e. Card (A) /t), the incoherence between the measurement vectors ©a and the data x is likely to 
guarantee that x* does not belong to the null space of ©a. As a consequence, the compressed data always 
contain a piece of information belonging to x*. Standard compression methods (which are non-linear) 
do not verify this crucial property. For a faint source, a standard compression method will kill its noisy 
high frequencies and they will never be recovered whatever the number of times this source is observed. 
CS will increase the SNR of the source with growing number of observations. Compressed sensing is 
flexible enough to take advantage (in the decompression step) of the redundancy of these kind of data 
to overcome the loss of SNR after compression. 



III. Experiment : the Herschel project 

Herschel is one of the cornerstone missions of the European Space Agency (ESA). This space telescope 
has been designed to observe in the far-infrared and sub-millimeter wavelength range. Its launch is sche- 
duled for the fall of 2008. The shortest wavelength band, 57-210 fim, is covered by PACS (Photodetector 
Array Camera and Spectrometer) [61], which provides low to medium resolution spectroscopy and dual- 
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Fig. 6. Top - left : Input image x* of size 128 x 128. Top - right : First noisy input data x\. White Gaussian noise is added 
with SNR = — 26dB. Bottom-left : Estimate from the average of 10 images compressed by JPEG with a compression rate 
p = 0.25. Bottom-right : Estimate from 10 pictures compressed by CS with a compression rate p = 0.25. 

band photometry. When PACS is used as a photometer, it will simultaneously image with its two bolometer 
arrays, a 64 x 32 and a 32 x 16 matrix, both read out at 40 Hz. The ESA is faced with a challenging 
problem : conventional low-cost compression techniques cannot achieve a satisfactory compression rate. 

In this Section, we propose a new CS-based compression scheme for the Herschel/PACS data that yield 
an elegant and effective way to overcome the Herschel compression dilemma. 

A. The Herschel dilemma 

The Herschel space telescope is partially hampered by the narrowness of the transmission band 
compared to the large amount of data to be transferred. This handicap stems from the limitation of 
conventional compression techniques to provide adequate compression rate with low computational cost, 
given the high readout noise. More quantitatively, the data have to be compressed in real time by a factor 
of 16 with very low CPU power. The lossless compression (classically based on entropy coding) that is 
presently coded on board compresses the data by a factor of 2.5. Up to now, the only acceptable solution 
(with respect to computational cost and quahty) to overcome this need for a higher compression rate is the 
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average of i consecutive images, typically 6 [62]. For pointed observations this strategy is near-optimal 
as it increases the SNR by a factor of s/i without loss of spatial resolution. Moreover, computing the 
mean of i images is clearly computationally very cheap. 

Nevertheless, observing wide sky areas requires fast scanning strategies. In that case, the shift between 
consecutive images may reach approximately A = 1 pixel while the FWHM (full width at half maximum) 
of the instrumental PSF (point spread function) is 5 ~ 3 pixels. Averaging 6 consecutive images yields 
an increase of the equivalent point spread function along the scanning direction thus leading to a loss 
of spatial resolution. This consequence can be catastrophic for some scientific programs. Furthermore, 
averaging is far less optimal for noise reduction as the useful part of the data is also spread when 
consecutive images are averaged. An effective compression scheme would have to balance between the 
following performance criteria : 

- Spatial resolution : fast scan entails a low spatial resolution. An effective compression scheme 
would provide a lower resolution loss. 

- Sensitivity : assuming that between consecutive non-shifted images instrumental noise is inde- 
pendent, averaging provides an optimal SNR. A lower noise ratio provides a higher signal detection 
abiUty. 

B. Compressed sensing for the Herschel data 

The Herschel/PACS mission needs a compression rate equal to 6. A first approach would amount to 
compress independently each image. As stated earlier, the more prior information is accounted for, the 
more effective the compression scheme is. Then, compressing 6 consecutive images jointly would be 
more relevant. If we consider a stack of 6 consecutive images {a;j}i=o,... ,5, the simplest generative model 
is the following : 

VzG {O,--- ,5}; Xi = Tx^{x*)+ni (20) 

where T\. is an operator that shifts the original image x* with a shift Aj. In practice, x* = xq and 
Ao = 0. The signal rij models instrumental noise or model imperfections. According to the compressed 
sensing framework, each signal is projected onto the subspace ranged by a subset of columns of 0. Each 
compressed observation is then obtained as follows : 



Vi € {0, • • • ,5}; Ui = ®KiXi 



(21) 
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where the sets {Aj} are such that : 

^ Ia, = I and Card (A^) = C 

i 

The decoding step amounts to seeking the signal x* as follows : 



(22) 



mm 



in ||*^a||4 s.t. WVi - {x^^Wl^ < and x* > 



(23) 



i=l,- - ,5 

We propose solving this problem by using an adapted version of the ProxIT algorithm we introduced in 
Section II-BI Furthermore, the content of astronomical data is often positive. Constraining the solution 
to be positive would help solving the recovery problem. Assuming that the shifting operator Tx^ is 



invertibk 



we substitute Equation (fT4l) by the following Equatio 



6 



(24) 



i=l,--- ,5 

The positivity constraint is accounted for by projecting at each iteration the solution of the previous 
update equation on the cone generated by the vectors having positive entries : x*^^^ <— Pc (^x*^^^ 
where the projector Pc is defined as follows : 



Vi = l, Pc{x)[^ = { 



x[i] if x[i] > 
otherwise 



(25) 



where Pc (x) [i] is the i-th entry of Pc (x). In the next section, we illustrate the good performances of 
the proposed non-linear decoding scheme. 



Notations 

In the next experiments, the data will made of pointwise sources ; it is worth defining some useful 
notations. Recall that we assume the telescope's PSF to have a FWHM equal to 6. The shift between 
the original datum x* and the i-th datum Xj is Aj. The intensity / of the datum x* is defined as its total 

^This assumption is true when shifting the image does note deteriorate the original signal. 

^Note that if the operator T\. were linear {i.e. T\. (x) = T\.x), then this update would be recast as follows : a;*'''-' = 
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Fig. 7. The proposed Herschel compression scheme. 



flux : 



where x[j] is the j-th entry. We also assume the x* has positive entries. 



(26) 



C. A toy-example 

In the following experiments, the datum x* is a 128 x 128 image. The instrument is assumed to have a 
FWHM (5 = 3 pixels. For the sake of simplicity, each shift \i = i pixels. White Gaussian noise is added 
to account for the instrumental noise. 

1) Detection performances: In this experiment, the datum contains 49 point sources that have been 
uniformly scattered. The amplitude of each point source is generated at random with a Gaussian distri- 
bution. The top-left picture of Figure [8] shows the input data x*. The additive Gaussian noise has a fixed 
unit variance. The top-right panel of Figure [8] features the data x* contaminated with noise. Comparisons 
between the M06 ("Mean of 6 images") and CS methods are made by evaluating for varying intensity 
value (from 700 to 140000 ; it is equivalent to a SNR varying from —13.2 to 33dB) the rate of detected 
point sources. To avoid false detection, the same pre-processing step is performed : i) "a trous" bspline 
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wavelet transform (see [63]), ii) 5aM hard-thresholding^ where cjm is the residual standard deviation 
estimated by a Median Absolute Deviation (MAD) at each wavelet scale, iii) reconstruction. The bottom- 
left panel of Figure [8] features such filtered decoded image using the M06 strategy. The bottom-right 
picture in Figure [8] shows the filtered ProxIT solution. In this experiment the total intensity of the point 
sources is set to 3500. At first sight, both methods provide similar detection performances. As expected, 
the CS-based solution has a better spatial resolution. 

Figure |9] shows the detection rate (with no false detection) of each method for intensities varying from 
/ = 700 to / = 140000. At high intensity (higher than / = 10^), both M06 and CS provide rather 
similar detection performances. Interestingly, at low intensity, CS provides slightly better results. This 
unexpected phenomenon is partly due to the spread that results from the average of shifted images. 
M06 is theoretically (for low shifts) near-optimal for point source detection. In contrast, this experiment 
shows that CS can provide similar or better detection performances than M06. 





Fig. 8. Top left : Original image of size 128 x 128 the total intensity of which is f — 3500. Top right : First input noisy 
map (out of 6). White Gaussian with variance — 1 was added. Bottom left : Mean of the 6 input images. Bottom right : 
Reconstruction from noiselet-based CS projections. The ProxIT algorithm has been used with Pmax = 100. 



Such 5a M is likely to avoid false detection as it defines a rather conservative threshold. 
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Fig. 9. Detection rate when the intensity of the input data varies : Solid line Resolution defined by the Rayleigli criterion 
of the CS-based reconstruction, o : Resolution of the solution provided by the mean of 6 images. 

2) Resolution: Spatial resolution is a crucial instrumental feature. Averaging shifted images clearly 
deteriorates the final spatial resolution of Hershel/PACS. In this experiment, the original datum x* is made 
of a couple of point sources. In the worst case, these point sources are aligned along the scan direction. 
The top-left picture of Figure [TO] features the original signal x*. In the top-right panel of Figure [TOl the 
intensity of the point sources is set to / = 1000 while the noise variance is o",^ = 1. The SNR of the 
data to compress is equal to 2.7dB. The M06 solution (resp. the CS-based solution) is shown on the left 
(resp. right) at the bottom of Figure [TO] As expected, the spatial resolution of the M06 is clearly worse 
than the resolution of the input datum x*. Visually, the CS-based solution mitigate the resolution loss. 
For different intensity of the datum x* (from 100 to 2000), the spatial resolution is evaluated according 
to the Rayleigh criterion. The Rayleigh criterion is the generally accepted criterion for the minimum 
resolvable detail : two point sources are resolved when the first minimum is lower than the amplitude at 
half maximum of a single point source as illustrated in Figure [U] For a fixed intensity /, the resolution 
limit is evaluated by seeking the minimal distance between the point sources for which the Rayleigh 
criterion is verified. For intensities varying from / = 100 to / = 2000, the resolution limit is reported 
in Table 1. 

The CS-based compression scheme provides a solution with better spatial resolution. At high intensity, 
the resolution gain (in comparison with M06) is equal to a third of the instrumental FWHM (1 pixel). 
At low intensity, the resolution gain provided by the CS-based method slightly decreases. 
This experiment shows that CS mitigates the resolution loss resulting from the joint compression of 6 
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consecutive images. 




Fig. 10. Top left : Original image of size 128 x 128 the total intensity of which is / = 1000. Top right : First input noisy 
map (out of 6). White Gaussian with variance cr^ = 1 was added. Bottom left : Mean of the 6 input images. Bottom right : 
Reconstruction from noiselet-based CS projections. The ProxIT algorithm has been used with Pmax = 100. 



0.5 




Fig. 11. The Rayleigh criterion - Left : The point sources are not resolved. Middle : Resolution Umit. Right : Fully 
resolved point sources. 
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D. Realistic data 

1) The data: Real Herschel/PACS data are more complex than those we simulated in the previous 
experiments. The original datum x* is contaminated with a slowly varying "flat field" component cj. In 
a short sequence of 6 consecutive images, the flat field component is almost fixed. In this context, the 
data {xi\i=Q^... 1 can then be modeled as follows : 

Xi = Ta^ {x'') + ni + cf (27) 



Assuming that c/ is known, the ProxIT algorithm can be updated by substituting Equation (|24l) with the 
following : 




yl-lA,e(T,fx*^'^~'U-Cf] )> (28) 




i=l,- ,5 

If Cf is unknown, it can be estimated within the ProxIT algorithm. The next Section focuses on the 
resolution gain provided by the CS- based method in the scope of real Herschel/PACS data. The data 
have been designed by adding realistic pointwise sources to real calibration measurements performed in 
mid-2007. 

2 ) Resolution: Similarly to the experiments performed in Section |III-C.2[ we added a couple of point 
sources to Herschel/PACS data. The top-left picture of Figure [12] features the original signal x*. In the 
top-right panel of Figure [T2j the intensity of the point sources is set to / = 4500. The "flat field" 
component overwhelms the useful part of the data so that x* has at best a level that is 30 times lower 
than the "flat field" component. The M06 solution (resp. the CS-based solution) is shown on the left 
{resp. right) and at the bottom of Figure [12] and all the results are presented in Table 2. Similarly to 
the previous fully simulated experiment, the CS-based algorithm provides better resolution performances. 
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2.7 


4.7 


6.2 
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Intensity 


100 
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1000 
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1500 
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M06 


2.7 


2.7 


2.7 


2.7 


2.7 


2.7 


2.7 


2.7 


2.7 


CS 


2 


2 


1.7 


1.7 


1.7 


1.7 


1.7 


1.7 


1.7 



TABLE I 

Spatial resolution in pixels : for varying datum flux, the resolution limit of each compression 
technique is reported. the cs-based compression entails a resolution gain equal to a 30% of the spatial 

resolution PROVIDED BY M06. 
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The resolution gain can reach 30% of the FWHM of the instrument's PSF for a wide range of signal 
intensities. This experiment illustrates the reliability of the CS-based compression to deal with real-world 
data compression. 




Fig. 12. Top left : Original image of size 32 x 64 with a total intensity of / = 4500. Top right : First input noisy map 
(out of 6). The PACS data already contains approximately Gaussian noise. Bottom left : Mean of the 6 input images. Bottom 
right : Reconstruction from noiselet-based CS projections. The ProxlT algorithm has been used with Pmax = 100. 
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2 


2 


2 


2 


2 


2 



TABLE 11 

Spatial resolution in pixels : for varying datum flux, the resolution limit of each compression 
technique is reported. the cs-based compression entails a resolution gain equal to a 30% of the spatial 

resolution PROVIDED BY M06. 



IV. Conclusion 

In this paper, we overview the potential applications of compressed sensing (CS) in astronomical 
imaging. The CS appeal in astronomy is twofold : i) it provides a very easy and computationally 
cheap coding scheme for on-board astronomical remote sensing, ii) the decoding stage is flexible enough 
to handle physical priors that lead to significant recovery enhancements. This paper introduces a new 
recovery algorithm to deal with the decoding problem. Based on iterative threshold, the ProxIT algorithm 
provides efficient approximate solutions to the decoding problem. Furthermore, the proposed algorithm is 
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easy to handle as it requires setting only a few parameters. We show that the ProxIT algorithm is easily 
adapted to account for physical priors thus entaihng better recovery results. We particularly point out the 
huge advantage of compressed sensing over standard compression techniques in the scope of multiple 
scanning observations (observing the same sky area several times). In this context, CS is able to provide 
astounding recovery results by taking advantage of the redundancy of the data. We have shown that 
compressed sensing data fusion can lead to astounding improvements compared to standard techniques. 
Preliminary numerical experiments illustrate the rehabihty of a CS-based compression scheme in the 
scope of astronomical remote sensing such as the Herschel space mission. We show that compressed 
sensing provides an elegant and effective compression technique that overcome the compression issue 
ESA is faced with. In the next step we will focus on performing more realistic experiments in the scope 
of the Herschel space mission by adding more physical information. 
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